598 research outputs found
Symmetric Spaces and Star representations II : Causal Symmetric Spaces
We construct and identify star representations canonically associated with
holonomy reducible simple symplectic symmetric spaces. This leads the a
non-commutative geometric realization of the correspondence between causal
symmetric spaces of Cayley type and Hermitian symmetric spaces of tube type.Comment: 13 page
On NP-Hardness of the Paired de Bruijn Sound Cycle Problem
The paired de Bruijn graph is an extension of de Bruijn graph incorporating
mate pair information for genome assembly proposed by Mevdedev et al. However,
unlike in an ordinary de Bruijn graph, not every path or cycle in a paired de
Bruijn graph will spell a string, because there is an additional soundness
constraint on the path. In this paper we show that the problem of checking if
there is a sound cycle in a paired de Bruijn graph is NP-hard in general case.
We also explore some of its special cases, as well as a modified version where
the cycle must also pass through every edge.Comment: Peer-reviewed and presented as part of the 13th Workshop on
Algorithms in Bioinformatics (WABI2013
An Efficient Algorithm For Chinese Postman Walk on Bi-directed de Bruijn Graphs
Sequence assembly from short reads is an important problem in biology. It is
known that solving the sequence assembly problem exactly on a bi-directed de
Bruijn graph or a string graph is intractable. However finding a Shortest
Double stranded DNA string (SDDNA) containing all the k-long words in the reads
seems to be a good heuristic to get close to the original genome. This problem
is equivalent to finding a cyclic Chinese Postman (CP) walk on the underlying
un-weighted bi-directed de Bruijn graph built from the reads. The Chinese
Postman walk Problem (CPP) is solved by reducing it to a general bi-directed
flow on this graph which runs in O(|E|2 log2(|V |)) time. In this paper we show
that the cyclic CPP on bi-directed graphs can be solved without reducing it to
bi-directed flow. We present a ?(p(|V | + |E|) log(|V |) + (dmaxp)3) time
algorithm to solve the cyclic CPP on a weighted bi-directed de Bruijn graph,
where p = max{|{v|din(v) - dout(v) > 0}|, |{v|din(v) - dout(v) < 0}|} and dmax
= max{|din(v) - dout(v)}. Our algorithm performs asymptotically better than the
bidirected flow algorithm when the number of imbalanced nodes p is much less
than the nodes in the bi-directed graph. From our experimental results on
various datasets, we have noticed that the value of p/|V | lies between 0.08%
and 0.13% with 95% probability
SEQuel: improving the accuracy of genome assemblies
Motivation: Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop SEQuel, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled contigs. Fundamental to the algorithm behind SEQuel is the positional de Bruijn graph, a graph structure that models k-mers within reads while incorporating the approximate positions of reads into the model
The Fibers and Range of Reduction Graphs in Ciliates
The biological process of gene assembly has been modeled based on three types
of string rewriting rules, called string pointer rules, defined on so-called
legal strings. It has been shown that reduction graphs, graphs that are based
on the notion of breakpoint graph in the theory of sorting by reversal, for
legal strings provide valuable insights into the gene assembly process. We
characterize which legal strings obtain the same reduction graph (up to
isomorphism), and moreover we characterize which graphs are (isomorphic to)
reduction graphs.Comment: 24 pages, 13 figure
A New Simulated Annealing Algorithm for the Multiple Sequence Alignment Problem: The approach of Polymers in a Random Media
We proposed a probabilistic algorithm to solve the Multiple Sequence
Alignment problem. The algorithm is a Simulated Annealing (SA) that exploits
the representation of the Multiple Alignment between sequences as a
directed polymer in dimensions. Within this representation we can easily
track the evolution in the configuration space of the alignment through local
moves of low computational cost. At variance with other probabilistic
algorithms proposed to solve this problem, our approach allows for the creation
and deletion of gaps without extra computational cost. The algorithm was tested
aligning proteins from the kinases family. When D=3 the results are consistent
with those obtained using a complete algorithm. For where the complete
algorithm fails, we show that our algorithm still converges to reasonable
alignments. Moreover, we study the space of solutions obtained and show that
depending on the number of sequences aligned the solutions are organized in
different ways, suggesting a possible source of errors for progressive
algorithms.Comment: 7 pages and 11 figure
Thermodynamics of protein folding: a random matrix formulation
The process of protein folding from an unfolded state to a biologically
active, folded conformation is governed by many parameters e.g the sequence of
amino acids, intermolecular interactions, the solvent, temperature and chaperon
molecules. Our study, based on random matrix modeling of the interactions,
shows however that the evolution of the statistical measures e.g Gibbs free
energy, heat capacity, entropy is single parametric. The information can
explain the selection of specific folding pathways from an infinite number of
possible ways as well as other folding characteristics observed in computer
simulation studies.Comment: 21 Pages, no figure
An Integrative Method for Accurate Comparative Genome Mapping
We present MAGIC, an integrative and accurate method for comparative genome mapping. Our method consists of two phases: preprocessing for identifying “maximal similar segments,” and mapping for clustering and classifying these segments. MAGIC's main novelty lies in its biologically intuitive clustering approach, which aims towards both calculating reorder-free segments and identifying orthologous segments. In the process, MAGIC efficiently handles ambiguities resulting from duplications that occurred before the speciation of the considered organisms from their most recent common ancestor. We demonstrate both MAGIC's robustness and scalability: the former is asserted with respect to its initial input and with respect to its parameters' values. The latter is asserted by applying MAGIC to distantly related organisms and to large genomes. We compare MAGIC to other comparative mapping methods and provide detailed analysis of the differences between them. Our improvements allow a comprehensive study of the diversity of genetic repertoires resulting from large-scale mutations, such as indels and duplications, including explicitly transposable and phagic elements. The strength of our method is demonstrated by detailed statistics computed for each type of these large-scale mutations. MAGIC enabled us to conduct a comprehensive analysis of the different forces shaping prokaryotic genomes from different clades, and to quantify the importance of novel gene content introduced by horizontal gene transfer relative to gene duplication in bacterial genome evolution. We use these results to investigate the breakpoint distribution in several prokaryotic genomes
Safe and complete contig assembly via omnitigs
Contig assembly is the first stage that most assemblers solve when
reconstructing a genome from a set of reads. Its output consists of contigs --
a set of strings that are promised to appear in any genome that could have
generated the reads. From the introduction of contigs 20 years ago, assemblers
have tried to obtain longer and longer contigs, but the following question was
never solved: given a genome graph (e.g. a de Bruijn, or a string graph),
what are all the strings that can be safely reported from as contigs? In
this paper we finally answer this question, and also give a polynomial time
algorithm to find them. Our experiments show that these strings, which we call
omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of
dbSNP locations have more neighbors in omnitigs than in unitigs.Comment: Full version of the paper in the proceedings of RECOMB 201
- …